Reducing Parallel Overheads Through Dynamic Serialization

نویسندگان

  • Michael Voss
  • Rudolf Eigenmann
چکیده

If parallelism can be successfully exploited in a program, signi cant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel overheads, the overall program performance can degrade. We propose a framework, based on an inspector-executor model, for identifying loops that are dominated by parallel overheads and dynamically serializing these loops. We implement this framework in the Polaris parallelizing compiler and evaluate two portable methods for classifying loops as pro table or unpro table. We show that for six benchmark programs from the Perfect Club and SPEC 95 suites, parallel program execution times can be improved by but as much as 85% on 16 processors of an

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P

Modern HEC systems, such as Blue Gene/P, rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. This means that the local preand post-communication processing required by the MPI stack might not be very fast, owing to the slow processing cores. Similarly, small amounts of serialization within the MPI stack that were acceptabl...

متن کامل

The Importance of Non-Data-Communication Overheads in MPI

With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single processing units. Instead, they rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. Using such low-frequency cores, how...

متن کامل

Making the Compilation “Pipeline” Explicit: Dynamic Compilation Using Trace Tree Serialization

Trace-based compilers operate by dynamically discovering loop headers and then recording and compiling all paths through a loop that are executed with sufficient frequency. The different paths through each loop form a tree, with the loop header at the root, in which common code is shared up-stream. Such trace-trees can be serialized in a specific manner that allows us to organize the compiler p...

متن کامل

Metadata-Based Parallelization of Program Instrumentation

Program instrumentation has a wide variety of useful applications, but tool writers must overcome the challenge of substantial overheads caused by introducing additional code and data into a program. This paper observes that instrumentation usually operates on many discrete, independent data structures, which we callmetadata parallelism. We proposes to exploit this phenomenon to reduce the over...

متن کامل

Reducing overheads of dynamic scheduling on heterogeneous chips

In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration of both devices in the execution of parallel workloads. In this work, we focus on the problem of efficiently scheduling chunks of iterations of parallel loops among the co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999